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L E X I N GTON 


M \ SS  \(  HI  SETTS 


A NUMERICAL  STUDY  OF  EIGENVALUE  COMPUTATION 


ABSTRACT 

A study  of  different  numerical  methods  for  computing  the  eigenvalues  of 
Hermitian  matrices  has  been  carried  out.  The  Givens-Householder  procedure 
was  found  to  be  the  fastest,  most  accurate  method.  Relationships  among 
eigenvalue  accuracy,  word  size,  form  of  arithmetic,  hardware  implementation, 
and  speed  are  discussed. 
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1.0  INTRODUCTION 


In  certain  communications  systems  the  ability  to  rapidly  and  accurately 
solve  the  algebraic  eigenvalue  problem  is  of  extreme  importance.  Where  the 
location  of  transmitters  is  to  be  determined  from  a set  of  N array  output 
readings,  one  possible  signal  processing  formulation  leads  to  the  analysis  of 
an  N by  N complex,  Hermitian*  matrix  [10].  The  elements  of  this  matrix  R arc 
complex  since  sensor  readings  generate  both  magnitude  and  phase  information. 
This  matrix  R is  a correlation  matrix  formed  by  taking  the  expectation  E(  ) 
of  the  matrix  (x^Hx^*)  where  x.  is  a vector  of  length  N corresponding  to 
samples  of  the  N array  elements  at  time  j and  t represents  vector  transpose 
tion.  Thus 


R = E[(Xj)(x  *)*]  . 

Since  the  signal  received  at  any  array  element  is  a linear  combination  of  the 
transmitted  signals  from  many  possible  transmitters,  the  ability  to  uniquel v 
identify  the  number  of  transmitters  is  limited  by  the  number  of  array  sensor  . 
That  is,  with  N sensors  it  is  only  possible  to  identify  at  most  N supposedly 
distinct  transmitters. 

If  a given  amount  of  amplification  (or  power)  is  available  in  the  forn. 
of  weights  to  be  assigned  to  each  of  the  possible  array  outputs,  then  at  mo 
N-l  weights  may  be  specified  independently.  Mathematically,  if  the  weights 
are  represented  by  a complex  vector  W and  the  total  power  available  for  wci>. 
ing  is  constrained,  then  this  is  equivalent  to  the  constraint  equation 


WtW*  = 1 


If  we  desire  that  as  many  of  the  transmitted  signals  as  possible  be  nulled, 
so  as  to  maximize  (or  minimize)  the  received  power,  it  is  a classical  re 


*A  Hermitian  matrix  R has  elements  r.  . such  that  r.  . r ; thus  the  dm. 

, , i i i l u 

elements  are  pure  real. 
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of  matrix  tiieory  (the  Rayleigh  quotient)  that  this  can  be  achieved  by  choosing 
as  a weight  vector  the  eigenvector  of  R with  the  largest  (or  smallest)  eigen- 
value. Thus  the  problem  of  finding  suitable  weights  for  received  signal 
vectors  x.  may  be  recast  into  the  problem  of  finding  the  eigenvalues  and 
eigenvectors  of  the  correlation  matrix  R associated  with  the  vector  x.. 

The  rapid  and  accurate  computation  of  these  quantities  is  the  subject 
of  this  note. 


1.1  Algebraic  Eigenvalue  Problem 

The  determination  of  the  eigenvalues  and  eigenvectors  for  an  arbitrary 
matrix  is  known  as  the  algebraic  eigenvalue  problem  and  its  solution  in  both 
the  theoretical  and  computational  sense  has  received  a considerable  amount  of 
attention.  The  classic  discussion  of  the  various  techniques  available  for  the 
solution  of  this  problem  is  the  book  by  J.  li.  Wilkinson,  The  Algebraic  Eigen- 
value Problem,  Oxford  University  Press  (1965)  [1]. 

The  various  techniques  available  read  like  a flow-chart  in  terms  of  the 
nature  of  the  matrix  to  be  solved.  Assuming  that  the  matrix  is  not  degenerate: 

1.  Is  the  matrix  real  or  complex? 

2.  If  complex,  is  it  Hermitian? 

3.  Are  only  the  eigenvalues  required  or  must  the  eigenvectors 
be  determined  as  well? 

4.  How  much  storage  is  available? 

5.  How  many  bits  of  accuracy  in  the  results  arc  required? 

6.  How  many  operations  are  required  for  the  computation? 

1 . 2 Hermitian  J«lat rices_  and  Their  Solution 

The  matrices  we  will  be  dealing  with  are  complex  Hermitian.  In 
general  we  will  need  the  eigenvectors  as  well  as  the  eigenvalues  for  t lie 
entire  matrix.  Possible  dynamic  strategies  when  not  all  the  eigenvectors 
may  be  required  will  be  discussed  in  a later  section.  The  reduction  of  tht 
scope  of  the  problem  to  this  specific  type  of  matrix  leads  to  the  conclusion 


that  there  are  really  three  major  algorithms  available  for  the  computation  of 
the  eigenvalues  and  eigenvectors:  ( 1 ) the  Jacobi  algorithm,  (2j  the  Hessenberg/ 

QR  procedure,  and  (.3)  the  Givens-Householder  algorithm.  These  algorithms  are 
iterative  procedures  which  converge  to  the  correct  answer.  They  will  be  dis- 
cussed in  detail  in  later  sections.  All  of  these  techniques  are  based  on  the 
following  theorem: 

If  B is  any  n x n non-singular  matrix,  then  A and  B *AB  have 
the  same  eigenvalues.  Moreover,  if  x is  an  eigenvector  of 
B ^AB,  then  Bx  is  an  eigenvector  of  A. 

This  transformation  from  A to  B ^AB  is  called  a similarity  transformation. 

The  Jacobi,  Hessenberg/QR,  and  Givens-Householder  technique  are  based  on  proper 
choices  for  the  matrix  B. 

Almost  all  of  the  numerical  analysis  performed  on  this  problem  were  done 
on  an  IBM  370/168  equipped  with  the  basic  CPU.  For  reasons  that  will  be 
explained  later  this  represents  a realistic  configuration  for  a lower  bound 
on  the  computation  for  any  serial  processor.  Timing  figures  for  computations 
performed  on  this  system  will  be  used  extensively  in  the  following  sections. 

2.0  MATRIX  ALGORITHMS 

The  algorithms  discussed  below  have  been  implemented  at  the  Lincoln  lab- 
oratory Computation  Center.  Complete  discussions  of  them  can  be  found  in 
[5,6,8],  In  addition  ALGOL  versions  of  the  programs  are  in  [7].  Where  com- 
parison between  the  IBM  370/168  and  Lincoln  Laboratory  DIG  11/45  were  per- 
formed, the  identical  Givens-Householder  FORTRAN  routines  were  run  on  the  same 
matrices . 

2.1  The  Jacobi  Algorithm 

The  Jacobi  algorithm  operates  by  finding  a sequence  of  similarity 
transformation0  such  that  the  transformed  matrix  converges  to  a diagonal 
matrix  A whose  elements  o».  the  diagonal  represent  the  eigenvalues  ( X ^ , >,  .... 
A^J.  As  an  example  in  two  dimensions  consider  the  matrix: 


A 


R 


r r 
11  12 


r21  r22 


One  possible  similarity  transformation  may  bo  determined  by  observing  that  i 
plane  rotation  of  the  coordinates  space,  ( x ^ , x ) , to  a new  space  I.',. 
he  performed  in  such  a way  that  the  terms  r ^ and  r^  are  annihilated.  Ihi- 
rotat ion  can  be  represented  by 


V ~ 

' 1 

cosO 

-s  i n0 

V 

>2 

s in0 

COS0 

x2 

hxpress ions  for  determining  the  values  of  cos6  and  sind  for  this  annihilation 
may  be  found  in  [2],  The  use  of  this  algorithm  for  llermitian  matrices  of  si  a 
X by  N depends  on  successive  rotation  of  pairs  of  axes  until  the  sum  of  the 
-quares  of  t he  off-diagonal  terms  goes  to  zero  (or  at  least  less  than  some 
predetermined  threshold).  The  cross-term  element  r that  is  to  In  annihi 

pq 

lated  during  any  given  iteration  is  called  the  pivot  for  the  plane  rot  it' 
and  . oldstine,  Murray,  and  Van  Neumann  |S|  showed  that  it  the  pivot  clio 
for  each  iteration  has  magnitude  greater  than  tin  average  magni t tide  fin  .a. 
off  diagonal  element  then  R will  converge  to  A. 

\ computer  program  for  the  calculation  of  tin  i ip  iiulucs  and  ei  \t  t. 
of  a real  symmetric  matrix  using  the  dacohi  proceduri  i ■ avai  1 al • 1 • • in  ••OR'II.' 
in  the  IBM  Scientific  Subroutine  Package  (TIT, IX).  (einplex  llermitian  i itii.i 
may  he  analyzed  by  noting  that  R can  be  written  as: 


the  matrix  II  = 


\ . X..  X.  A 


t :i 


C * il)  where  C,  l>  R 

which  is  2X  by  2\  may  be  sis  m to  have  e ip  is  > 'a.- 
, ...  \ a re  the  c i i > nv  a I ; a • 


. . , \ , \ ) , where  i , , 
’ n n 


Not  much  more  will  be  said  about  the  Jacobi  procedure  because  it  has  a 
maior  problem  in  terms  of  the  number  of  computations  required.  In  general  an 
infinite  number  of  iterations  (sweeps)  are  required  to  produce  a true  diagonal 
matrix.  In  practice  while  the  residual  off-diagonal  terms  may  only  have  to 
satisfy  some  threshold  condition  (e.g.,  see  above),  a term  r which  is  anni- 
hilated on  one  iteration  will  most  likely  be  resurrected  on  a later  iteration. 
Thus  convergence  proceeds  at  a relatively  slow  rate  and  the  computation 
required  is  large. 

2 . 2 The  Hessenberg/QK  Algorithm 

This  procedure  represents  the  concatenation  of  two  procedures 
the  determination  of  eigenvalues  and  eigenvectors.  The  first  procedure 
a general  (not  necessarily  Hermitian)  matrix  to  a Hessenberg  form,  that 
matrix  11  for  which  lu  . =0  if  j < i-1.  Thus  for  a 4 x 4 matrix  we  have 


11 

h12 

h23 

h14 

21 

^22 

h23 

''24 

0 

h52 

h33 

h34 

0 

0 

h43 

h44 

The  second  procedure  uses  a series  of  unitary  similarity  transformations  to 
reduce  the  Hessenberg  matrix  to  an  upper  triangular  matrix  which  displays  the 
eigenvalues  on  the  main  diagonal.  Let  11  be  the  matrix  in  Hessenberg  form  to 
be  analyzed , R he  an  upper  triangular  matrix  and  Q a unitary  matrix,  then  if 
II  is  non-singular  there  exists  a unique  decomposition 


time 


for 

reduces 
is  a 


II 


QR 


The  the i rem  happens  to  be  true  for  any  non-singular  matrix  H but  the  mrJ 
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comput  it  ions  is  reduced  from  N to  N if  H is  a Hessenberg  matrix.  ine  pmice 
of  finding  the  eigenvalues  is  based  upon  application  of  the  iterative  count  1 .... 


II. 

l 


H.  , 
i+l 


Q.R. 

xi  i 

q:1  h.q. 

u 1X1 


= R.Q. 

l xi 


An  example  of  how  to  determine  Q.  and  R is  given  in  [2,  p.  922J.  If  H. 
(the  initial  Hessenberg  matrix)  has  eigenvalues  satisfying  | A ^ j > ] \ , 1 •••• 
j 1 > 0 the  sequence  {if}  will  converge  to  an  upper  triangular  matrix  with, 
the  eigenvalues  on  the  main  diagonal.  A major  risk  incurred  by  the  use  of 
this  routine  is  that  when  any  pair  of  eigenvalues  are  approximately  equal , the 
procedure  may  not  converge. 

The  use  of  a reduction-to-llessenberg  algorithm  followed  by  application  of 
the  QR  procedure  is  important  since  it  represents  a standard  technique  for  the 
analysis  of  these  complex  matrices.  At  Lincoln  Laboratory,  the  programs  ma>  be 
found  in  a standard  FORTRAN'  package  described  in  (e>J  and  called  by  the  n.sc 
LIU \ Unfortunately,  this  is  the  same  name  used  by  the  IBM  Scientific  s. :*  - 
routine  lockage  for  the  Jacobi  algorithm.  The  number  of  arguments  used  in 
the  minor  subroutine  call  to  EIGEN  differs  (5  for  Hessenberg/QR,  4 for  Am  1 * 
however,  making  it  possible  to  detect  which  procedure  is  being  calleu  in  .. 
given  program. 

Finally,  it  should  be  noted  that  although  the  Hessenberg/QR  is  con  idcr 
ably  faster  than  the  Jacobi  procedure  and  hence  is  needed  to  provide  ha  el  me 
computation  tune  measurements,  it  does  rot  take  full  advantage  of  the 
tries  inherent  in  a llermitian  matrix  and  thus  is  not  the  fastest  possible 
algori  thin. 


(> 


Fhe  Givens -Householder  Algorithm 


The  Jacobi  algorithm  operates  to  transform  a correlation  matrix  R to 
a diagonal  matrix  A.  The  Givens-Householder  procedure  transforms  R to  a sym- 
metric tridiagonal  matrix,  T,  whose  elements  t.  are  non-zero  for  at  most 

1 J 

|i-jj  1.  In  a sense  the  T matrix  is  a very  special  case  of  a Hessenberg 
matrix.  Thus  after  the  reduction  one  possible  way  to  determine  the  eigen- 
values is  to  apply  the  QR  algorithm.  At  least  two  other  techniques  are  avail- 
able for  finding  the  eigenvalues:  the  Sturm  sequence  approach  |1,“]  and  the 

Q|.  method  (a  predecessor  of  the  (JR  method)  which  is  the  technique  actually 
used  for  the  work  reported  here  [1,2,7].  The  two  greatest  virtues  of  the 
Givens-Householder  techniques  are  (1)  its  speed  --  using  the  smallest  number 
of  computations  to  get  to  the  point  where  the  eigenvalue/eigenvector  routine 
is  actually  called,  and  (2)  its  excellent  numerical  stability  and  accuracy. 

A detailed  explanation  of  the  method  by  which  the  symmetric,  tridiagonal 
matrix  is  produced  will  not  be  given  here.  Instead  the  reader  is  referred  to 
| 2,  p.  901;  3,  p.  169]. 

2 . 1 Compari son  of  Methods 

The  best  comparison  of  the  Jacobi  to  Givens-Householder  approach  is 
given  in  |1,  p.  334 | . In  every  aspect  (speed,  accuracy,  storage,  solution  for 
i limited  number  of  eigenvalues,  etc.)  the  Givens-Householder  approach  is 
superior. 

In  comparing  the  Givens-Householder  method  to  the  Hessenberg - QR  approach, 
we  find  that  the  latter  falls  between  the  Jacobi  and  Givens-Householder  method 
in  value.  This  really  is  not  surprising  since  in  the  computation  of  numerical 
quantities  the  fewer  the  number  of  computation  cycles  required  the  greater  thi 
accuracv  , thus  the  fastest  algorithm  is  most  often  the  most  accurate.  The 
: • on  for  this  is  quite  clear;  every  multi  pi > operation  and  add  operation 
introduces  the  possibility  of  noise  through  either  overflow  or  round-off. 
tin  fewer  the  number  of  operations  the  taster  and  more  accurate  the  result. 


Thus 


In  tin.  next  section  we  shall  deal  with  some  numerical  evaluations  of 
specific  matrices  and  comparisons  of  accuracies  and  computations  for  various 
algorithms  and  processors. 

3.0  I. Si 1M\T ION  OF  COMPUTATION  TIME  AND  ACCURACY 

To  compare  various  algorithms  and  different  machine  configurations  for 
computation  time  and  accuracy,  a series  of  experiments  were  performed. 

3 . 1 Complex  vs.  Real  Arithmetic 

Wilkinson  has  pointed  out  [7]  that  it  is  not  uncommon  for  software 
packages  that  implement  complex  arithmetic  to  take  significantly  longer  than 
might  be  expected.  If  we  assume  that  a multiply  takes  approximately  five 
times  the  length  of  time  required  for  an  add,  then  a complex  multiply  should 
take  four  to  five  times  the  length  of  an  ordinary  (real)  multiply.  It  is 
not  unusual,  though,  to  find  software  packages  that  do  considerably  worse 
than  this.  To  guard  against  this  possibility  on  the  IBM  370/168  a simple 
test  was  run  to  compare  the  speed  of  analysis  for  a 3 * 3 Hermitian  matrix 
II  - A + iB  against  the  6x6  real  matrix  formed  by: 

■ 

A -B 

B A 
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Using  the  fastest  possible  algorithm  (i.e.,  the  Givens  Householder  tin  o , 
analysis  took  276  clock  times  versus  the  real  analysis  which  took  1138  if  i 
times  (each  clock  interval  equals  13  microseconds).  The  complex  matrix  p.s  I 
age  may  therefore  be  used;  the  analysis  of  N * N complex  matrices  is  - ignn'i 
cantl)  faster  than  the  analysis  of  2N  * 2N  real  matrices. 
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3.2  Hessenber-i/djR  vs.  H i yens  -llouschoi  Jer  C.umputiit  ion  limes 

The  two  procedures  were  run  on  identical  matrices  with  the  si:e  of 
the  matrix  N being  varied  over  the  range  2 £ N < 20.  The  number  of  clock  tint 
from  the  start  to  the  finish  of  each  matrix  analysis  was  recorded.  A polynomial 
regression  program  was  used  to  fit  the  computation  time  as  a function  of  matrix 
sice,  T(N).  A cubic  polynomial  was  found  to  provide  the  best  fit.  This  w.i  t 
he  expected  since  analysis  of  the  computer  programs  predicts  a computation  t i un- 
proportional to  N'\  For  the  data  collected  the  comparative  graphs  using  the 
cubic  fit  are  given  in  Fig.  1.  The  equations  for  the  two  curves  arc: 

Householder  --  R (N ) = 2.64N'^  23.54N*"  + 253. 2bN  375.03 

Hessenberg  --  T(N)  = 29.48N"'  — 358. 43N~  + 1905. 63N  + 2592.54 

where  computed  values  of  T(N]  are  in  clock  intervals  of  13  microseconds.  While 
this  is  not  meant  to  provide  exact  prediction  of  computation  time  for  arbitrary 
Hermitian  matrices,  it  does  show  an  approximate  11:1  improvement  in  speed  for 
large  N.  Further  the  Householder  procedure  clearly  dominates  for  N 3.  The 
coefficient  of  fit  of  the  two  cubic  polynomials  to  the  data  was  given  by  the 
correlation  coefficient  squared  which  in  each  case  was: 

Householder  --  p = .997 

•> 

Hessenberg  --  p“  = .999 


Only  the  values  of  computation  time  for  2 \ ■ 12  were  used  in  the  actual 

regression  analysis.  The  actual  values  for  \ - II  may  be  used  to  check  the 
prediction  accuracy  of  the  cubic  polynomials.  The  cubic  equation  for  the 
Householder  based  on  2 X < 12  seems  to  underpredict  the  computation  t ime  t 
at  most  5()°o  for  11  N • 20.  The  Hessenberg  equation  overpredicts  the  t i mi 
In  as  much  as  100...  In  any  case  the  Householder  procedure  still  clearlv  do: 
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5.5  Computational  Accuracy  of  the  Two  Procedures 

According  to  Wilkinson  |1]  each  procedure  is  numerically  quite  stable. 
To  compare  accuracies  the  following  matrix  was  analyzed: 
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The  three  eigenvalues  were  identical  through  at  least  the  first  14  decimal 
places,  and  this  exceeded  the  published  accuracy  of  these  eigenvalues  in 
Wilkinson  (7].  Similar,  high  accuracy  results  were  obtained  in  other  ma trie 

5.4  Analysis  of  Computat ional  Accuracy 

Wilkinson  has  done  extensive  research  into  finding  closed  form  ex; 
sions  for  the  accuracy  of  eigenvalues  computed  by  various  procedures.  In  '’in- 
sertion we  shall  deal  with  the  accuracy  of  the  .Jacobi  procedure  and  tin  accur.n 
of  the  Householder  procedure.  We  know  from  our  previous  discussion  that  tl* 
results  for  the  Hessenberg  will  fall  somewhere  between  these  two. 
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3.4.1  The  Jacobi  Procedure 


3.4.1. 1 Floating-Point  Calculations 

It  has  been  shown  that  the  RMS  error  in  the  estimate  ot  the 


eigenvalues  is  given  by  [1,  p.  279]: 


Z ‘vv2 


RMS  N 


< 108*2~VVJ  • ( 1 +9-2_t  » ! ' 


lExi 


where : 


N = matrix  size 


t = bits  of  mantissa 


X . = exact  eigenvalues 

p.  = computed  eigenvalues 


and  b sweeps  of  the  Jacobi  procedure  are  assumed.  For  reasonable  si  e t i vr 


t • 10)  we  have: 


so  that 


(1  + 9.2  ) 1 


‘kms  ?-  10a-2''  • N'V‘ 


But  this  is  the  most  pessimistic  estimate  of  the  error  and  if  we  expect  ■> 
reasonable  distribution  of  rounding  errors  then  a more  realistic  bound  i 


t „ 3/4 

1 RMS  ' U*  * N 


3.4. 1.2  Fixed -Point  Calculation 


For  fixed-point  computation  of  the  eigenvalue  a similar 

analysis  leads  to: 


- ms  s-  «-V"  v"‘: 


uhere : 

is  a constant  such  that  1 <.  K < 10 
t = number  of  bits 
N = matrix  size 

3.4.2  The  Householder  Procedure 

The  error  generated  in  this  method  may  be  broken  into  two  piece' 
corresponding  to  each  of  the  two  major  steps  in  the  analysis.  The  first  is  tin 
error  associated  with  the  computation  of  the  tridiagonal  matrix  from  the  orig- 
inal matrix.  The  second  is  the  error  associated  with  the  e i genanalvs i s of  tlu 
tridiagonal  matrix. 

3.4.2. 1 Error  for  Tridiagonal  Reduction 

3. 4. 2. 1.1  Floating-Point  Calculation 

Floating-point  calculations  may  be  done  either  with 
or  without  accumulation  of  partial  sums  in  double  precision.  That  is,  if  we 
are  computing  a dot  product  of  two  real  vectors  (x  * y),  then  the  computation 
o f : 

i = 1 

may  be  done  by  rounding-off  after  each  multiply  and  add  operation  (without 
accumulation)  or  bv  waiting  until  the  \ multiplies  and  adds  are  completed 
and  then  rounding-off  (with  accumulation).  If  accumulation  is  used  the  only 


double  precision  number  that  must  be  saved  is  the  partial  sum  (of  products'. 
The  difference  in  the  RMS  error  between  these  two  techniques  is  at  worst  a 
factor  of  N.  The  expressions  are  given  by: 

Floating-Pt.  with  Accumulation 


e RMS  * 40N2't 


Floating- ft.  without  Accumulation 


< < 32N22_t 

RMS 


The  constants  40  and  32  are  upper  bounds  and  actually  depend  on  details  of 
how  the  floating-point  math  is  done.  For  example,  small  numbers  should  be 
added  together  first. 

3.4.2. 1.2  Fixed-Point  Calculation 

In  fixed -point  computations  accumulation  with  doubl 
precision  of  partial  sums  may  likewise  he  used.  Unfortunately  no  simple 
expression  is  available  for  the  RMS  error.  Instead,  however,  a simple  expre 
sion  is  available  for  the  maximum  error.  The  results  are 


Fixed-Pt.  with  Accumulation 


( max 

I 


ma  x I A . - u . 
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Fixed-Pt.  without  Accumulation 


k,  and  k,  are  constants  that  once  again  depend  on  details  of  how  the  mathe- 
matics is  done  and  the  assumption  about  round-off  error  (i.e.  , worst-case, 

average,  etc.).  In  either  case  1 £ K-,,  k < 10.  We  will,  in  fact,  use 

4 1 3 

K,  = k^  = 2 as  a conservative  upper  bound. 

3. 4. 2. 2 bigenvalue  Error  for  Tridiagonal 

As  mentioned  earlier,  a number  of  techniques  exist  for 
determining  the  eigenvalues  of  a tridiagonal  matrix.  For  each  of  these 
methods,  however,  the  error  is  less  than  the  error  associated  with  determining 
the  tridiagonal  matrix.  In  the  Sturm  sequence  approach,  for  example,  the 
error  after  p iterations  is  given  by: 

|up+1  A|  < (15.06)2't  ♦ 3 • 2~P 

This  is  less  than  the  error  for  the  tridiagonal  matrix  by  a factor  of  N.  Thus 
we  may  conclude  that  the  error  associated  with  the  Householder  method  is  dom- 
inated by  the  error  involved  in  the  reduction  to  a tridiagonal  matrix. 


■ Summary  and 

Comparison 

For  the  Jacobi 

procedure 

we  have : 

Re  fe fences 
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For  the  Householder: 
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max 
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accumulation 


£ < 

max 


K32-V/2 


[1,  p.  299] 


In  comparing  .Jacobi  to  Householder  we  see  they  are  virtually  the  same  in  terms 
of  the  bounds.  Thus  we  will  focus  our  attention  solely  on  the  issue  of  the 
error  in  the  Householder  approach. 

While  the  application  referred  to  in  Section  1 needs  the  solution  to  the 
eigenvector  problem  to  determine  the  appropriate  weight  vectors,  we  will  use 
the  eigenvalue  problem  to  gain  insight  into  what  type  of  processor  power  is 
required  to  achieve  an  arbitrary  accuracy. 

If  we  assume  that  we  would  like  to  be  able  to  specify  the  eigenvalue  to 
12  bits,  i.e.,  one  part  in  4096,  then,  how  many  bits  t must  the  processor 
have  if  the  matrix  size  is  N?  All  of  our  error  bounds  are  of  the  form: 


If  we  require  that 


-t  v 

A 2 N < 


o-13 


then  wc  certainly  satisfy  the  conditions  of  our  problem.  Solving  for  t yields: 


j t > 13  + ' log,N  + log  .A 


(’lotting  tins  expression  as  a function  of  N with  the  various  strategies  as 
parameters  us.  have  Fig.  2.  We  see  that  in  the  worst  case  (fixed  pt . , without 
accumulation)  we  need  less  than  .30  bits  to  achieve  the  required  accuracy  when 
\ 20.  For  the  best  case  (floating  pt . , with  accumulation)  wc  need  more  than 
20  hits  when  N 20.  This  immediately  suggests  that  the  optimum  processor  (In 
hit)  in  terms  of  cost  and  speed  uses  double  precision  integer  (i.e.,  fixed  pt . > 
arithmetic.  In  terms  of  error  analysis  this  is  just  as  effective  <t  strateg>  a 


NUMBER  Of  BITS 


Fig.  2.  Comparison  of  the  number 
12-bit  accuracy  in  the  eigenvalue 
and  the  computational  mathematics 


of  bits  theoretically  required  to  achiex 
computation  as  a function  of  matrix  si:, 
used . 


using  floating  point  with  accumulation  since  it  reall}  is  not  possible  to  buy  a 
JO  bit  processor  and  floating  point  computations  take  significant  time  even  in 
hardware. * 

4.0  REAL-TIME  EIGENANALYSIS 

It  is  of  interest  to  determine  what  is  the  largest  matrix  that  could  be 
continually  analyzed  in  a given  amount  of  time.  t;or  example,  if  100  milli- 
seconds are  available,  what  is  the  largest  N such  that  the  complete  eigen- 
analysis  could  be  performed?  In  that  way  the  weight  (eigenvector)  could  be 
updated  10  times  per  second.  To  answer  this  we  first  look  at  the  results  from 
Section  3.2  for  the  Householder  analysis.  Replotted  as  in  Pig.  3 we  have  that 
in  100  ms  the  largest  value  is  N=15.  This  value  is  clearly  dependent  on  the 
hardware  configuration  of  the  processor  and  the  efficiency  of  the  language 
translator. 

4.1  The  IBM  570/108  as  a Serial  Processor 

The  computations  described  above  were  made  on  the  Lincoln  Laboratory 
IBM  370/168.  This  machine  has  a basic  machine  cycle  time  of  80  nanoseconds 
operating  out  of  its  16  general  registers.  Access  to  high  speed  buffer  storage 
takes  from  100  ns  (if  overlap  is  used)  to  240  ns.  Add  time  is  1 machine  cycle, 
(he  data  path  for  all  these  operations  is  8 bytes  (64  bits)  wide.  The  fixed- 
point  multiply  takes  780  ns;  the  floating-point  multiply,  1870  ns.  Since  there 
is  no  special  purpose  hardware  for  vector  or  matrix  operations  (that  is,  wi 

-y 

cannot  do  a parallel  x • y)  , we  consider  for  the  purpose  of  this  report  tli.it 
the  computer  is  strictly  serial  processing.  No  paralb  1 i sm  is  available  to  cut 
down  computation  time.  In  that  case  the  times  sited  ibovt  represent  just  about 
the  best  one  can  do  with  current  solid-state  hardware  including  special  pur:  o- 
processors.  (he  are  not  including  design  involving  tie  recently  ant  line  s ub- 
nonosecond  1 Cl  logic.)  Thus  the  370/168  represents  a realistic  lower  bound  for 
the  class  of  till  general  purpose  and  special  purpose  serial  processors. 

•It  might  be  useful,  however,  to  explore  one  or  two  of  the  24 -bit  machines 
with  specially  constructed  high-speed  floating  point  hardwart  with  double 
prec i s i on  for  accurnu 1 at i on . 
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I . 2 I mguage  I'rans  1 a t ion 

lo  take  advantage  of  the  370/168  hardware  speed  the  program--  executed 
on  it  must  make  efficient  use  of  the  general  purpose  registers,  index  register! . 
high-speed  buffer  storage,  overlapping,  and  hardware  multipliers.  All  of  the 
programs  were  written  in  FORTRAN  IV  and  compiled  on  li  EXTENDED  IBM  OS  COMP II. LR 
|4|.  Mi  is  compiler  has  optimization  features  to  improve  run-time  performance. 
The  programs  themselves  were  carefully  coded  to  maximize  performance. 

4.3  Summary  for  IBM  Analysis 

Since  we  have  here  the  coupling  between  an  efficient,  optimized  pro- 
gram and  compiler  and  an  extremely  high-speed  process  a we  may  treat  the 
results  in  Section  4.0  as  a realistic  upperbound  to  the  matrix  size  N that 
can  l>e  processed  in  a given  time  T.  To  go  to  the  other  end  of  the  spectrum 
we  run  the  same  programs  on  a POP  11/45  to  estimate  the  "slowest"  that  one 
would  have  to  settle  for  in  terms  of  modern  minicomputer  performance. 

4 . 4 POP  11/45  System 

The  DEC  POP  11/45  system  used  is  in  Lincoln  laboratory's  Croup  54. 

It  consists  of  64k  words  of  900  ns  core  memory  and  a number  of  I/O  peripherals. 
It  does  not  include  a hardware  floating-point  processor  nor  an  optimized 
FORTRAN  compiler.  Memory  interleaving  permits  an  effective  cycle  time  of 
450  n^.  Running  the  same  matrices  and  the  same  programs  on  the  11  15  as 
were  run  on  the  370/ lb8  showed  a decrease  in  speed  by  about  a factor  of  I5u. 

Thus  only  a * 5 matrix  (N=2),  could  be  analyzed  in  1 < >0  ms . If  we  compare 
the  largest  matrix  ( N = 2 0 ) and  ask  for  the  total  runiniu  time  on  each  of  the 
two  systems  the  results  would  be: 

IBM  370/168  (N=20)  T ' ,25s 

I"  f PUP  1 1/45  (N=20)  T i 40s 

Of  course,  computation  on  the  11/45  could  be  substantially  improved  by  the 
fol  1 ow  i ng : 
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2.  Floating-point  hardware  and  optimized  FORTRAN,  or 

3.  Double-precision  integer  hardware  and  optimized  machine 
language  code. 

It  is  the  author's  experience  that  use  of  the  third  approach  often  leads  to 
improvement  on  the  order  of  100  in  speed  thus  bringing  it  lose  to  the  IBM 
system  in  performance.  Ultimately  this  is  at  the  sacrifice  of  accuracy,  but 
as  discussed  in  an  earlier  section  the  issue  of  accuracy  is  not  critical  here- 
given  32  bit  arithmetic. 

5.0  ALTERNATIVES  FOR  IMPROVEMENT  OF  SYSTEM  PERFORMANCE 

Two  basic  approaches  are  available  to  improve  system  performance.  The 
first  involves  a greater  use  of  parallel  processing  to  perform  the  eigenanal 
ysis.  The  second  involves  a modification  of  the  problem  statement  to  a dynamic 
strategy  so  that  not  all  eigenvectors  are  continually  being  determined. 

5 . 1 Parallel  Processing 

The  Householder  approach  naturally  suggests  the  use  of  pipeline  pro- 
cessing techniques  to  achieve  an  increase  in  speed.  Two  processors  can  be 
used;  one  to  perform  the  reduction  to  tridiagonal  form  and  the  other  to  perfor" 
the  eventual  eigenanalysis.  At  best,  however,  this  leads  to  a decrease  in  time 
by  a factor  of  2 if  both  processes  take  the  same  time,  (liven  the  low-cost  of 
modern  processors  such  as  the  11/45  and  the  use  of  shared  memories  this  is 
certainly  a reasonable  first  improvement. 

Reviewing  the  details  of  the  algorithm  provides  another  approach  to 
decreasing  the  computation  time.  At  a number  of  places  in  this  memorandum 
we  have  used  the  vector  dot  product  (VDP)  as  an  example  of  a typical  calculi 
tion.  In  fact,  the  VDP  is  used  extensively  in  the  Householder  analysis  ami 
this  leads  us  to  consider  if  we  might  improve  its  performance  on  our  target 
machine.  The  time  required  (t^.^p)  is  obviously  on  the  order  of  N times  the 
time  required  for  one  multiply  and  add  operation  (MAUI: 
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tVL)P 


'V  N (MAD) 


To  improve  this  number  we  have  two  choices: 


1.  Build  a special  purpose  dot-product  calculator  that  is  given  the 
starting  address  and  length  of  the  two  vectors  and  uses  direct  memory  access 
(DMA)  to  acquire  and  compute  the  required  result.  Instead  of  N multiplies  and 
adds  we  have  N memory  fetches  (assuming  a super  high  speed  multiplier  in  the 
special  purpose  calculator).  Since  a MAD  takes  about  ? to  9 memory  cycles  this 
implies  a time  reduction  of  that  amount. 

2.  Build  a ful 1 -paral lei  vector  dot-product  calculation  that  has 
special  simultaneous  access  to  all  the  elements  of  each  vector.  Assuming  a 
full  set  of  multipliers  (N)  the  number  of  cycles  would  be  proportional  to 

log.,N  adds  (this  is  based  on  an  initial  model  that  looks  like  the  following 

- >■  — ► 

for  (x  * y)  when  N=4) . 


Of  course,  this  is  the  costliest  solution  hut  it  provide--  I i n-du  tic: 
computations  from  the  N'  noted  in  an  earlier  section  t - about  . lor  Inc 
A this  can  mean  a significant  reduction  in  computation  time. 


5.2 


Dynamic  Strategies 


In  many  instances  where  the  eigenvector  weights  that  we  are  deter- 
mining are  not  all  continually  changing,  it  is  inappropriate  to  continue 
solving  for  all  the  eigenvectors.  Alternatively  we  might  only  wish  to  solve 
for  the  eigenvector  associated  with  the  biggest  (or  smallest)  set  of  eigen- 
vectors . 

Where  these  two  possibilities  exist  the  use  of  the  Sturm  sequence  approach 
[1-3,7]  provides  the  solution  to  just  the  eigenvalues  at  a further  increase  in 
speed.  The  desired  eigenvectors  may  then  h<  determined  using  seveial  diffei 
ent  techniques  (inverse  iteration,  power  method  with  deflation,  (3]i.  It  tin 
number  of  eigenvectors  that  need  to  be  found  is  small  compared  to  then  a 
further  savings  in  computation  time  will  exist.  If,  however,  we  do  not  know, 
a priori,  how  many  eigenvectors  we  will  need  until  we  have  looked  at  the 
eigenvalues,  then  we  can  only  talk  about  the  expected  computation  time.  The 
expected  computation  time  requires  .1  probabilistic  model  for  the  number  of 
eigenvalues  that  will  need  to  be  solved  at  an>  time.  The  determining  of  an 
appropriate  model  could  form  the  basis  for  further  work  on  this  problem. 
Complicated  strategies,  like  following  the  trend  in  the  number  of  eigenvectors 
that  need  determination,  might  also  provide  incremental  savings  in  computa- 
tion time. 

o .  0 SUMMARY  AND  CONCLUSIONS 

1.  The  Givens-Householder  procedure  for  the  analysis  of  Uermitian 
matrices  provided  the  fastest,  most  accurate  method  for  determining  the  eigen- 
vectors and  eigenvalues. 

2.  (liven  a requirement  of  accuracy  to  12  hits,  32  bits  represent  i 
convenient  sice  for  mathematical  operation;  that  is,  double  precision  integei 
on  a lo  bit  minicomputer. 

3.  If  the  entire  operation  is  coded  in  machine  language  and  use: 

hardware  multiply/divide  then  one  can  expect  to  be  able  to  process  10  10 

(N  10)  matrices  in  100  ms  intervals.  The  sice  of  other  matrices  that  mav  N 


processed  given  other  time  intervals  may  be  determined  from  Fig.  3 and  the 
results  of  Sections  4.4  and  5.1. 

4.  If  higher  speed  is  required  then  more  parallel  processing  must 
i>e  used.  In  addition  dynamic  strategies  may  be  employed  when  only  a limited 
number  of  eigenvectors  need  be  determined. 
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